nlp_architect.data.sequential_tagging.CONLL2000

class nlp_architect.data.sequential_tagging.CONLL2000(data_path, sentence_length=None, max_word_length=None, extract_chars=False, lowercase=True)[source]

CONLL 2000 POS/chunking task data set (numpy)

Parameters
  • data_path (str) – directory containing CONLL2000 files

  • sentence_length (int, optional) – number of time steps to embed the data. None value will not truncate vectors

  • max_word_length (int, optional) – max word length in characters. None value will not truncate vectors

  • extract_chars (boolean, optional) – Yield Char RNN features.

  • lowercase (bool, optional) – lower case sentence words

__init__(data_path, sentence_length=None, max_word_length=None, extract_chars=False, lowercase=True)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(data_path[, sentence_length, …])

Initialize self.

Attributes

char_vocab

character Vocabulary

chunk_vocab

chunk label Vocabulary

dataset_files

pos_vocab

pos label Vocabulary

test_set

get the test set

train_set

get the train set

word_vocab

word Vocabulary

char_vocab

character Vocabulary

chunk_vocab

chunk label Vocabulary

dataset_files = {'test': 'test.txt', 'train': 'train.txt'}
pos_vocab

pos label Vocabulary

test_set

get the test set

train_set

get the train set

word_vocab

word Vocabulary